Search CORE

66 research outputs found

Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

Author: BA Beckwith
Brett R South
D Gupta
E Aramaki
F Jeffrey Friedlin
FJ Friedlin
G Szarvas
H Dalianis
I Neamatullah
J Aberdeen
J Gardner
JJ Berman
K Hara
Matthew H Samore
O Uzuner
O Uzuner
Oscar Ferrández
P Ohm
R Grishman
Shuying Shen
SM Meystre
SM Meystre
Stéphane M Meystre
Y Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Automatic de-identification of textual documents in the electronic health record: a review of recent research

Author: B Wellner
BA Beckwith
Brett R South
C Friedman
D Gupta
DA Dorr
E Aramaki
EM Fielstein
F Jeffrey Friedlin
FJ Friedlin
FP Morrison
G Szarvas
G Szarvas
GPO U.S
GPO U.S
H Cunningham
I Neamatullah
J Gardner
JJ Berman
K Atkinson
K Hara
L Sweeney
Matthew H Samore
NCI
NLM
NLM
NLM
O Uzuner
O Uzuner
O Uzuner
O Uzuner
P Ruch
RK Taira
Shuying Shen
SM Meystre
SM Thomas
SM Thomas
Stephane M Meystre
Y Guo
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. Methods This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. Results The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. Conclusions In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.</p

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Actuation of Micro-Optomechanical Systems Via Cavity-Enhanced Optical Dipole Forces

Author: A Ashkin
A Dorsel
A Pai
A Schliesser
C Höhberger Metzger
C Manolatou
Christopher P. Michael
CP Michael
D Kleckner
F Marquardt
HA Haus
HJ Metcalf
I Favero
J Ng
J Yang
JP Gordon
M Borselli
M Borselli
M Fatih
M Notomi
M-CM Lee
Matt Eichenfield
ML Povinelli
O Arcizet
O Painter
Oskar Painter
P Meystre
PE Barclay
PE Barclay
PE Barclay
PF Cohadon
Q Xu
Raviv Perahia
S Gigan
SM Spillane
T Carmon
T Carmon
TD Stowe
TJ Kippenberg
XMH Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/02/2007
Field of study

We demonstrate a new type of optomechanical system employing a movable, micron-scale waveguide evanescently-coupled to a high-Q optical microresonator. Micron-scale displacements of the waveguide are observed for milliwatt(mW)-level optical input powers. Measurement of the spatial variation of the force on the waveguide indicates that it arises from a cavity-enhanced optical dipole force due to the stored optical field of the resonator. This force is used to realize an all-optical tunable filter operating with sub-mW control power. A theoretical model of the system shows the maximum achievable force to be independent of the intrinsic Q of the optical resonator and to scale inversely with the cavity mode volume, suggesting that such forces may become even more effective as devices approach the nanoscale.Comment: 4 pages, 5 figures. High resolution version available at (http://copilot.caltech.edu/publications/CEODF_hires.pdf). For associated movie, see (http://copilot.caltech.edu/research/optical_forces/index.htm

arXiv.org e-Print Archive

Crossref

Caltech Authors

CERN Document Server

Clinical narrative analytics challenges

Author: A Coden
A Rodríguez-González
A Rodríguez-González
AA Thomas
BL Humphreys
C Friedman
C Friedman
C Friedman
D Ferrucci
DA Hanauer
G Hripcsak
GK Savova
M Taboada
O Ben-Assuli
P Zweigenbaum
PM Pietrzyk
QT Zeng
R Costumero
R Costumero
R Costumero
SM Meystre
Y Ji
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Precision medicine or evidence based medicine is based on the extraction of knowledge from medical records to provide individuals with the appropriate treatment in the appropriate moment according to the patient features. Despite the efforts of using clinical narratives for clinical decision support, many challenges have to be faced still today such as multilinguarity, diversity of terms and formats in different services, acronyms, negation, to name but a few. The same problems exist when one wants to analyze narratives in literature whose analysis would provide physicians and researchers with highlights. In this talk we will analyze challenges, solutions and open problems and will analyze several frameworks and tools that are able to perform NLP over free text to extract medical entities by means of Named Entity Recognition process. We will also analyze a framework we have developed to extract and validate medical terms. In particular we present two uses cases: (i) medical entities extraction of a set of infectious diseases description texts provided by MedlinePlus and (ii) scales of stroke identification in clinical narratives written in Spanish

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease

Author: A Roberts
Adi V Gundlapalli
AV Gundlapalli
Brett R South
CR Weir
EM Fielstein
G Hripcsak
G Hripcsak
HJ Tange
Jennifer Garvin
JF Penz
JJ Cimino
MA Musen
Makoto Jones
Matthew H Samore
PV Ogren
RH Dolin
S Brown
SH Brown
SH Brown
Shuying Shen
SM Meystre
V Kashyap
W Chapman
Wendy W Chapman
WW Chapman
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Natural Language Processing (NLP) systems can be used for specific Information Extraction (IE) tasks such as extracting phenotypic data from the electronic medical record (EMR). These data are useful for translational research and are often found only in free text clinical notes. A key required step for IE is the manual annotation of clinical corpora and the creation of a reference standard for (1) training and validation tasks and (2) to focus and clarify NLP system requirements. These tasks are time consuming, expensive, and require considerable effort on the part of human reviewers. Methods Using a set of clinical documents from the VA EMR for a particular use case of interest we identify specific challenges and present several opportunities for annotation tasks. We demonstrate specific methods using an open source annotation tool, a customized annotation schema, and a corpus of clinical documents for patients known to have a diagnosis of Inflammatory Bowel Disease (IBD). We report clinician annotator agreement at the document, concept, and concept attribute level. We estimate concept yield in terms of annotated concepts within specific note sections and document types. Results Annotator agreement at the document level for documents that contained concepts of interest for IBD using estimated Kappa statistic (95% CI) was very high at 0.87 (0.82, 0.93). At the concept level, F-measure ranged from 0.61 to 0.83. However, agreement varied greatly at the specific concept attribute level. For this particular use case (IBD), clinical documents producing the highest concept yield per document included GI clinic notes and primary care notes. Within the various types of notes, the highest concept yield was in sections representing patient assessment and history of presenting illness. Ancillary service documents and family history and plan note sections produced the lowest concept yield. Conclusion Challenges include defining and building appropriate annotation schemas, adequately training clinician annotators, and determining the appropriate level of information to be annotated. Opportunities include narrowing the focus of information extraction to use case specific note types and sections, especially in cases where NLP systems will be used to extract information from large repositories of electronic clinical note documents.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

Optimising medication data collection in a large-scale clinical trial

Author: Ashley C. Stewart
B Ahmed
B Polnaszek
Christopher M. Reid
D Babre
DM Qato
J Jyrkkä
J Pathak
Jason Rigby
JB Schroll
Jessica E. Lockery
JJ McNeil
JJ McNeil
John J. McNeil
KA Kennelty
Liza Heslop
MH Chakos
Michael E. Ernst
MJ Borad
PT Chhabra
Q Gu
RL Richesson
Robyn L. Woods
SM Meystre
T Linjakumpu
Taya A. Collyer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

© 2019 Lockery et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Objective: Pharmaceuticals play an important role in clinical care. However, in community-based research, medication data are commonly collected as unstructured free-text, which is prohibitively expensive to code for large-scale studies. The ASPirin in Reducing Events in the Elderly (ASPREE) study developed a two-pronged framework to collect structured medication data for 19,114 individuals. ASPREE provides an opportunity to determine whether medication data can be cost-effectively collected and coded, en masse from the community using this framework. Methods: The ASPREE framework of type-to-search box with automated coding and linked free text entry was compared to traditional method of free-text only collection and post hoc coding. Reported medications were classified according to their method of collection and analysed by Anatomical Therapeutic Chemical (ATC) group. Relative cost of collecting medications was determined by calculating the time required for database set up and medication coding. Results Overall, 122,910 participant structured medication reports were entered using the type-tosearch box and 5,983 were entered as free-text. Free-text data contributed 211 unique medications not present in the type-to-search box. Spelling errors and unnecessary provision of additional information were among the top reasons why medications were reported as freetext. The cost per medication using the ASPREE method was approximately USD

0.03 compared with USD

0.20 per medication for the traditional method. Conclusion Implementation of this two-pronged framework is a cost-effective alternative to free-text only data collection in community-based research. Higher initial set-up costs of this combined method are justified by long term cost effectiveness and the scientific potential for analysis and discovery gained through collection of detailed, structured medication data

Crossref

Directory of Open Access Journals

espace@Curtin

Monash University Research Portal

A context-blocks model for identifying clinical relationships in patient records

Author: A Névéol
A Roberts
AK McCallum
AM Cohen
AR Aronson
Aurélie Névéol
C Friedman
ES Chen
F Leitner
H Shatkay
H Xu
J Aberdeen
J Björne
J Lafferty
L Smith
L Tanabe
M Bundschus
M Craven
M Krallinger
N Ponomareva
O Uzuner
O Uzuner
R Harpaz
R Islamaj Doğan
R Islamaj Doğan
Rezarta Islamaj Doğan
SM Meystre
SV Pakhomov
TC Rindflesch
TC Rindflesch
X Wang
X Wang
X Wang
Zhiyong Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Assessing the accuracy of an inter-institutional automated patient-specific health problem list

Author: AA Bui
Allen Huang
DL Hunt
DW Bates
DW Bates
EG Poon
EG Poon
F Puckett
F Sullivan
H Cao
HJ Scherpbier
HS Lau
Institute of Medicine (IOM)
J Miller
JJ Warren
Laurel Taylor
Lise Poissant
M Wilchesky
MA Krall
MM Wagner
PC Smith
PM Cox Jr
PM Gannon
R Hillestad
R Tamblyn
R Tamblyn
R Tamblyn
R Wilton
RM Goldberg
Robyn Tamblyn
S Lohr
SE Bedell
SJ Wang
SM Meystre
ST Corley
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Health problem lists are a key component of electronic health records and are instrumental in the development of decision-support systems that encourage best practices and optimal patient safety. Most health problem lists require initial clinical information to be entered manually and few integrate information across care providers and institutions. This study assesses the accuracy of a novel approach to create an inter-institutional automated health problem list in a computerized medical record (MOXXI) that integrates three sources of information for an individual patient: diagnostic codes from medical services claims from all treating physicians, therapeutic indications from electronic prescriptions, and single-indication drugs. Methods Data for this study were obtained from 121 general practitioners and all medical services provided for 22,248 of their patients. At the opening of a patient's file, all health problems detected through medical service utilization or single-indication drug use were flagged to the physician in the MOXXI system. Each new arising health problem were presented as 'potential' and physicians were prompted to specify if the health problem was valid (Y) or not (N) or if they preferred to reassess its validity at a later time. Results A total of 263,527 health problems, representing 891 unique problems, were identified for the group of 22,248 patients. Medical services claims contributed to the majority of problems identified (77%), followed by therapeutic indications from electronic prescriptions (14%), and single-indication drugs (9%). Physicians actively chose to assess 41.7% (n = 106,950) of health problems. Overall, 73% of the problems assessed were considered valid; 42% originated from medical service diagnostic codes, 11% from single indication drugs, and 47% from prescription indications. Twelve percent of problems identified through other treating physicians were considered valid compared to 28% identified through study physician claims. Conclusion Automation of an inter-institutional problem list added over half of all validated problems to the health problem list of which 12% were generated by conditions treated by other physicians. Automating the integration of existing information sources provides timely access to accurate and relevant health problem information. It may also accelerate the uptake and use of electronic medical record systems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Measuring diversity in medical reports based on categorized attributes and international classification systems

Author: AN Khan
C Ringlestetter
D Blokh
D Lee
DH Lee
E Conley
EH Simpson
G Eryiğit
GP Patil
H Peng
HA Park
HL Bleich
I Vajda
J Adášková
J Stausberg
J Zvarova
J Zvárová
J Zvárová
J Zvárová
J Zvárová
JA Bonachela
Jana Zvárová
JR Campbell
K Liu
K Zvára
Karel Zvára
LV Gault
P Massari
P Přečková
Petra Přečková
R Cornet
R Cornet
R Mareš
S Schulz
S-B Han
SM Meystre
W Ceusters
WA Benish
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Narrative medical reports do not use standardized terminology and often bring insufficient information for statistical processing and medical decision making. Objectives of the paper are to propose a method for measuring diversity in medical reports written in any language, to compare diversities in narrative and structured medical reports and to map attributes and terms to selected classification systems. Methods A new method based on a general concept of f-diversity is proposed for measuring diversity of medical reports in any language. The method is based on categorized attributes recorded in narrative or structured medical reports and on international classification systems. Values of categories are expressed by terms. Using SNOMED CT and ICD 10 we are mapping attributes and terms to predefined codes. We use f-diversities of Gini-Simpson and Number of Categories types to compare diversities of narrative and structured medical reports. The comparison is based on attributes selected from the Minimal Data Model for Cardiology (MDMC). Results We compared diversities of 110 Czech narrative medical reports and 1119 Czech structured medical reports. Selected categorized attributes of MDMC had mostly different numbers of categories and used different terms in narrative and structured reports. We found more than 60% of MDMC attributes in SNOMED CT. We showed that attributes in narrative medical reports had greater diversity than the same attributes in structured medical reports. Further, we replaced each value of category (term) used for attributes in narrative medical reports by the closest term and the category used in MDMC for structured medical reports. We found that relative Gini-Simpson diversities in structured medical reports were significantly smaller than those in narrative medical reports except the "Allergy" attribute. Conclusions Terminology in narrative medical reports is not standardized. Therefore it is nearly impossible to map values of attributes (terms) to codes of known classification systems. A high diversity in narrative medical reports terminology leads to more difficult computer processing than in structured medical reports and some information may be lost during this process. Setting a standardized terminology would help healthcare providers to have complete and easily accessible information about patients that would result in better healthcare.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Data-driven approach for creating synthetic electronic medical records

Author: A Narayanan
AL Buczak
AL Buczak
Anna L Buczak
AR Miller
B Bhumiratana
B Himes
D Buckeridge
D Daley
DT Dennis
EHY Lau
H Kantz
I Neamatullah
J Almenoff
J Ellis
J Gower
J Tokars
K El Emam
K Kondo
KH Anthony
KJ O'Malley
L Moniz
Linda Moniz
M Klompas
M Klompas
M Klompas
M Kratz
ME Evans
OK Arikan
P Jaccard
P Phillipe
PE Sartwell
R Crawford
R Lazarus
RE Harrell
RP Hafen
RS Evans
RS Evans
SK Greene
SM Meystre
Steven Babin
T Burr
X Wang
Y Meyer
Z Mnatsakanyan
Z Mnatsakanyan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. Methods This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. Results We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. Conclusions A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central